The SVM With Uneven Margins And Chinese Document Categorisation
نویسندگان
چکیده
We propose and study a new variant of the SVM — the SVM with uneven margins, tailored for document categorisation problems (i.e. problems where classes are highly unbalanced). Our experiments showed that the new algorithm significantly outperformed the SVM with respect to the document categorisation for small categories. Furthermore, we report the results of the SVM as well as our new algorithm on the Reuters Chinese corpus for document categorisation, which we believe is the first result on this new Chinese corpus.
منابع مشابه
The SVM With Uneven Margins and Chinese Document Categorization
We propose and study a new variant of the SVM — the SVM with uneven margins, tailored for document categorisation problems (i.e. problems where classes are highly unbalanced). Our experiments showed that the new algorithm significantly outperformed the SVM with respect to the document categorisation for small categories. Furthermore, we report the results of the SVM as well as our new algorithm...
متن کاملThe Perceptron Algorithm with Uneven Margins
The perceptron algorithm with margins is a simple, fast and effective learning algorithm for linear classifiers; it produces decision hyperplanes within some constant ratio of the maximal margin. In this paper we study this algorithm and a new variant: the perceptron algorithm with uneven margins, tailored for document categorisation problems (i.e. problems where classes are highly unbalanced a...
متن کاملUsing Uneven Margins SVM and Perceptron for Information Extraction
The classification problem derived from information extraction (IE) has an imbalanced training set. This is particularly true when learning from smaller datasets which often have a few positive training examples and many negative ones. This paper takes two popular IE algorithms – SVM and Perceptron – and demonstrates how the introduction of an uneven margins parameter can improve the results on...
متن کاملSVM Categorizer: A Generic Categorization Tool Using Support Vector Machines
Supervised text categorisation is a significant tool considering the vast amount of structured, unstru ctured, or semi-structured texts that are available from internal or external enterprise resources. The goal of supervised text categorisation is to assign text documents to finite pre -specified categories in order to extract and automatically organise information coming from th ese resources...
متن کاملSVM Based Learning System for Information Extraction
This paper presents an SVM-based learning system for information extraction (IE). One distinctive feature of our system is the use of a variant of the SVM, the SVM with uneven margins, which is particularly helpful for small training datasets. In addition, our approach needs fewer SVM classifiers to be trained than other recent SVM-based systems. The paper also compares our approach to several ...
متن کامل